AITopics | contextual rq-transformer

Although autoregressive models have achieved promising results on image generation, their unidirectional generation process prevents the resultant images from fully reflecting global contexts. To address the issue, we propose an effective image generation framework of \emph{Draft-and-Revise} with \emph{Contextual RQ-transformer} to consider global contexts during the generation process. As a generalized VQ-VAE, RQ-VAE first represents a high-resolution image as a sequence of discrete code stacks. After code stacks in the sequence are randomly masked, Contextual RQ-Transformer is trained to infill the masked code stacks based on the unmasked contexts of the image. Then, we propose the two-phase decoding, Draft-and-Revise, for Contextual RQ-Transformer to generates an image, while fully exploiting the global contexts of the image during the generation process.

contextual rq-transformer, draft-and-revise, effective image generation, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer (Supplementary Material) A Implementation Details A.1 Details of RQ-V AE

Neural Information Processing SystemsAug-18-2025, 16:27:51 GMT

In this section, we show additional examples of generated images by our Contextual RQ-Transformer. We use 1.4B parameters of Contextual RQ-Transformer trained on ImageNet for class-conditional

artificial intelligence, contextual rq-transformer, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.54)

Add feedback

c276c3303c0723c83a43b95a44a1fcbf-Paper-Conference.pdf

Neural Information Processing SystemsAug-18-2025, 16:27:48 GMT

artificial intelligence, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Country: Oceania > Australia > Victoria > Melbourne (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.73)

Add feedback

Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer

Neural Information Processing SystemsJan-18-2025, 19:04:11 GMT

Although autoregressive models have achieved promising results on image generation, their unidirectional generation process prevents the resultant images from fully reflecting global contexts. To address the issue, we propose an effective image generation framework of \emph{Draft-and-Revise} with \emph{Contextual RQ-transformer} to consider global contexts during the generation process. As a generalized VQ-VAE, RQ-VAE first represents a high-resolution image as a sequence of discrete code stacks. After code stacks in the sequence are randomly masked, Contextual RQ-Transformer is trained to infill the masked code stacks based on the unmasked contexts of the image. Then, we propose the two-phase decoding, Draft-and-Revise, for Contextual RQ-Transformer to generates an image, while fully exploiting the global contexts of the image during the generation process.

contextual rq-transformer, draft-and-revise, global context, (5 more...)

Neural Information Processing Systems

Technology: